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Abstract 

We present a new line-based discontinuous Galerkin (DG) discretization scheme for first- and second-order 
systems of partial differential equations. The scheme is based on fully unstructured meshes of quadrilateral 
or hexahedral elements, and it is closely related to the standard nodal DG scheme as well as several of 
its variants such as the collocation-based DG spectral element method (DGSEM) or the spectral difference 
(SD) method. However, our motivation is to maximize the sparsity of the Jacobian matrices, since this 
directly translates into higher performance in particular for implicit solvers, while maintaining many of the 
good properties of the DG scheme. To achieve this, our scheme is based on applying one-dimensional DG 
solvers along each coordinate direction in a reference element. This reduces the number of connectivities 
drastically, since the scheme only connects each node to a line of nodes along each direction, as opposed to 
the standard DG method which connects all nodes inside the element and many nodes in the neighboring 
ones. The resulting scheme is similar to a collocation scheme, but it uses fully consistent integration 
along each 1-D coordinate direction which results in different properties for nonlinear problems and curved 
elements. Also, the scheme uses solution points along each element face, which further reduces the number of 
connections with the neighboring elements. Second-order terms are handled by an LDG-type approach, with 
an upwind/downwind flux function based on a switch function at each element face. We demonstrate the 
accuracy of the method and compare it to the standard nodal DG method for problems including Poisson's 
equation, Euler's equations of gas dynamics, and both the steady-state and the transient compressible 
Navier-Stokes equations. We also show how to integrate the Navier-Stokes equations using implicit schemes 
and Newton-Krylov solvers, without impairing the high sparsity of the matrices. 
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1. Introduction 

In recent years it has become clear that the current computational methods for scientific and engineer- 
ing phenomena are inadequate for challenging problems. These include problems with propagating waves, 
turbulent fluid flow, nonlinear interactions, and multiple scales. This has resulted in a significant interest 
in so-called high-order accurate methods, which have the potential to produce fundamentally more reliable 
solutions. A number of numerical methods have been proposed, including multi-block finite difference meth- 
ods [1, 2, 3], high-order finite volume methods [4, 5], stabilized finite element methods [6], discontinuous 
Galerkin (DG) methods [7, 8, 9], DG spectral element methods (DGSEM) [10], spectral volume/difference 
methods [11, 12, 13, 14], and hybridized DG methods [15, 16]. All methods have advantages in particular sit- 
uations, but for various reasons most general purpose commercial-grade simulation tools still use traditional 
low-order methods. 
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Much of the current research is devoted to the discontinuous Galerkin method. This is partly because 
of its many attractive properties, including the use of fully unstructured simplex meshes, the natural stabi- 
lization mechanism based on approximate Riemann solvers, and the rigorous theoretical foundations. It can 
certainly be discussed why the DG method is not used routinely for real-world simulations, but one of the 
main reasons is clearly its high computational cost, which is still at least a magnitude more than low-order 
methods or high-order finite difference methods on similar grids. For some problems, explicit time- stepping 
or matrix-free implicit methods can be employed, but for many real-world problems and meshes full Jacobian 
matrices are required for the solvers to be efficient. Here, nodal-based Galerkin methods have a fundamental 
disadvantage in that they connect all unknowns inside an element, as well as all neighboring face nodes, 
even for first-order derivatives. This leads to a stencil size that scales like p D for polynomial degrees p in 
D spatial dimensions. As a contrast, a standard finite difference method only connects neighboring nodes 
along the D coordinate lines through the node. This gives a stencil size proportional to Dp, which in three 
dimensions can be magnitudes smaller ever for moderate values of p. 

Several high-order schemes for unstructured meshes have been proposed with a similar stencil-size reduc- 
tion. In particular, the DG spectral element method [10, 17] is a collocation-based method on a staggered 
grid which only uses information along each coordinate line for the discretized equations. Other closely 
related schemes have the same property, such as the spectral difference method [12], the flux reconstruction 
method [13, 14], and the DGM-FD method [18]. For the special case of a linear one-dimensional problem, 
many of these methods can be shown to be identical to the standard DG method [19], but in general they 
define different schemes with varying properties. 

In an attempt to further reduce the size of the Jacobians, and to ensure that the scheme is identical to the 
standard DG method along each line of nodes, we propose a new line-based DG scheme. Like the DGSEM, 
our Line-DG scheme is derived by considering only the 1-D problems that arise along each coordinate 
direction. We apply standard 1-D DG formulations for each of these sub-problems, and all integrals are 
computed fully consistently (with sufficient accuracy), which means in particular that the definition of the 
scheme makes no statement about flux points. We note that this can be done without introducing additional 
connectivities, since all nodes in the local 1-D problem are already connected by the shape functions. In 
addition, our scheme uses solution points along each clement face, which further reduces the number of 
connectivities with the neighboring elements. 

For the second-order terms in the Navier-Stokes equations, we use an LDG-type approach [20] with 
upwind/downwind fluxes based on consistent switches along all globally connected lines of elements. Special 
care is required to preserve the sparsity of the resulting matrices, and we propose a simple but efficient 
Newton-Krylov solver which splits the matrix product in order to avoid introducing additional matrix 
entries. Many options for preconditioning are possible, and in this work we use a block-Jacobi method with 
sparse blocks. 

Wc first describe the method for first-order systems in Section 2 and mention some practical implemen- 
tation issues, including a study of the structure of the Jacobian matrices. In Section 3 we extend the scheme 
to second-order systems using the LDG-type scheme, and in Section 4 we discuss the implicit temporal 
discretization, some approaches for maintaining the high sparsity of the discretization, and the Newton- 
Krylov solver. Finally, in Section 5 we show numerical results and convergence for Poisson's equation, an 
inviscid Euler vortex, and flow over a cylinder. We also compare the method to the standard nodal DG 
method, and we conclude that the differences are overall very small. For the Navier-Stokes equations, we 
show convergence of drag and lift forces for steady-state laminar flow around an airfoil, and we demonstrate 
our implicit time-integrators on a transient LES-type flow problem. 



2. Line-based discontinuous Galerkin discretization 

2.1. First-order equations 

Consider a system of m first-order conservation laws with source terms, 

du 



2 



in a three-dimensional domain fi, with solution it, flux function F(u), source function S(u), and appropriate 
boundary conditions on dfl. We will use a discretization of fl into non-overlapping, conforming, curved 
hexahedral elements. Within each element we introduce a Cartesian grid of (p-f-1) 3 node points, where p > 1, 
by defining a smooth one-to-one mapping given by a cliff comorphism x = x{X) between the reference unit 
cube V = [0, l] 3 and the element v, and setting jcy*. = x(Xijk), where Xijk = (si, Sj, Sk) for < i, j, k < p, 
and {si} is an increasing sequence of p + 1 node positions Si <G [0, 1] with so = and s p = 1 (see figure 1). 

To obtain our numerical scheme for approximating (1), we consider a single element v and its mapping 
x = x(X), and follow standard procedure to change independent variables from x to X. This transforms 
(f) into 

J-g + Vx • F(u) = JS(u), (2) 

in the reference domain V. Here we have defined the mapping Jacobian J — det(G) and the contravariant 
fluxes F = (fi, /2, /a) = JG~ X F, with the mapping deformation gradient G = Vx*. 

A standard nodal discontinuous Galerkin method would now consider the multivariate polynomial u(X) 
that interpolates the grid function, Uijk = u(Xijk), and define a numerical scheme for the spatial derivatives 
of (1) by a Galerkin procedure in V. Our approach differs in that it considers each of the three spatial deriva- 
tives in (2) separately and approximates them numerically using one-dimensional discontinuous Galerkin 
formulations along each of the 3(p + I) 2 curves defined by straight lines in the reference domain V ', through 
the three sets of nodes along each space dimension. 

More specifically, the (p + f) 2 curves along the first space dimension are Xjk(£) = x(£,Xj,Xk) for 
< j, k < p. On these we define the polynomial Ujk(^) & ^([0, l]) m that interpolates Uj^, i — 0, . . . ,p, 
and we define a numerical approximation rjk(Xi) to dfi/dXi by a one-dimensional Galerkin procedure: 
Find r jfc (£) € V p ([0, f]) m such that 



r 1 d/i(u Jfc (0) 



o Jo 
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for all test functions G ^([0, f]) m . Here, Uj k (l) is the numerical solution at Xjk(l + ) = x(l + , Xj,X k ), 
and similarly u~ k (0) at Xjk(0~) = x(0~ ,Xj,Xf.)- These will be given either by nodes in the neighboring 

elements or implicitly through the boundary conditions. Furthermore, /i(m_r,Wl) is a numerical flux func- 
tion for fi, but we note that with the reference normal direction TV^ = (1,0,0), the contravariant flux can 
be written 

fx = F ■ JV+ = (JG^F) -N+ =F (JG- T N+) = F ■ n+ (4) 

with the (non-normalized) normal vector — JG~ T N^~ at the boundary point cCjfc(l). Our numerical 
flux then becomes 

fi{u R ,u L ) = F ■ N+(u R ,u L ) = F ■ n+(u R ,u L ), (5) 

where F ■ n(u + , u~) is a standard numerical flux function used in finite volume and discontinuous Galerkin 
schemes, with normal direction n and traces in the positive/negative normal direction. This allows us 
to use existing flux functions and approximate Riemann solvers without modification. 

Similarly, for the second numerical flux we move the negative sign to the normal direction and define 
TVf = (-1,0,0) and n{ = JG' 7 'iVf , which is again an outward normal vector. We can then write: 

= F ■ N~(u L ,u R ) = F ■ n~(u L ,u R ), (6) 
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Figure 1: A two-dimensional illustration of the mapping from a reference element V to the actual curved element v, for the 
case p = 4. 

where we also have swapped the order of the arguments to the flux function F ■ n to be consistent with the 
negative normal direction. Our Galerkin scheme (3) then gets the final form: Find r,-fe(£) S ^([0, l]) m such 
that 

r jk (0 ■ v(0 d£ = F- n+(u+ k (1), u ifc (l)) • v(l) + F ■ n^(uj k (0),u jk (0)) ■ v(0) - J ■ ^ d£, 

" (7) 

for all t>(f) € -P p ([0,l]) m . 

We use a standard finite element procedure to solve (7) for T*jfc(£). Introduce the nodal Lagrange basis 
functions fa € V P ([0, 1]) such that (j>i{sj) = for i,j = 0, . . . ,p, and set 



p 

r 3k(Q =X) r Wi(f)i 



(8) 
(9) 



To find the m(p + 1) coefficients along the curve Xjk{£), we set v(£) = e^^), for each i = 0, . . . ,p and 
£ = 1, . . . , to, where (e^) n = <W for n = 1, .. . ,m. Our Galerkin scheme (7) then gets the discrete form 
Mrjk = b, and we find the coefficients rjk by solving to linear systems with the (p + l)-by-(p + 1) mass 
matrix TV/. Repeating the procedure for each j, k = 0, . . . ,p we obtain all coefficients r^j. — "' 
the grid function for our numerical approximation of dF\/dXi at each grid point x^. 



r ijk> which is 



(21 i > 

In an analogous way, we calculate coefficients r ■ J. and that approximate dF^/dXz and dF 3 /dX 3 , 



(3) 



ijfc 



ijfc 



respectively, at the grid points. The curves considered are now x^ = a;(Xj,£, Xj.) and = x(Xi,Xj,£), 
and with the reference normals TV^ = (0, ±1,0) and TV^ = (0,0, ±1) the contravariant fluxes fi and / 3 
can again be written a.s F ■ n where n — JG~ T N is a non-normalized normal vector to the element at the 
boundary points. The solution procedure involves the same mass matrix M and is identical to before. 
Using the calculated numerical approximations to each partial derivative in (2), we obtain our final 
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semi-discrete formulation: 




where J i]k = J(x ijk ). 

2.2. Implementation details 

For the mapping x(X) it is natural to use an iso-parametric approach. The node positions Xij k are 
given by some curved mesh generation procedure [21], and we define 

v 

x(x)= x ijk 4> i {x 1 )4 >j (x 2 ) ( i) k {x 3 ), (ii) 

i,j,k=0 

which clearly satisfies our interpolation requirement 

p p 
x(X ijk ) = ^2 Xi'j' k '<Pi'(si)(j)j'(sj)<f)k'{sk) = X] x i'j'k'5u>8jj>5h k > = Xijk- (12) 

i'.j',k'—0 i'^j'.k'—Q 

This allows us to easily compute G(X) at any point -X", which will involve the derivatives of the shape 
functions. To evaluate the one-dimensional integrals in (7), we use Gauss-Legendre integration of sufficiently 
high degree. For all our problems, a precision of 3p appears to be enough, so we use integration rules with 
[(3p + l)/2] integration points. 

The computation of the discretization (7) is remarkably simple compared to a nodal DG scheme, primarily 
because (a) The integrals are only one-dimensional, and (b) The numerical fluxes are only evaluated point- 
wise. We note that the left-hand side of (7), which contributes to the mass matrix M, is constant regardless 
of solution component, line, and element (even if the actual mapped element is curved). Therefore it 
can be pre-computed and pre-factorized using a standard Cholesky method. Furthermore, many lines and 
components can be processed simultaneously, which might further increase the performance through the use 
of BLAS3-type cache-optimized linear algebra libraries. 

For the integral in the right-hand side of (7), the term dv/d^ is again constant for all components, lines, 
and elements, so its discretization at the Gauss integration points can be pre-computed and combined with 
the inverted mass matrix and the Gauss integration weights w. For non- linear problems, the only part that 
requires re-evaluation at each Gauss integration point is f±(uj k (^)), although the deformation gradient G 
can be pre-computed if necessary. 

For the numerical fluxes, we pointed out above that (5) and (6) have exactly the same form as standard 
numerical flux functions. We pre-compute the outward normals ri\ and n~, for i — 1,2,3, at all boundary 
nodes. Note that our scheme only computes point-wise numerical fluxes, unlike the nodal DG method which 
involves integrals of the numerical fluxes. Sometimes existing numerical flux functions require the normal 
vector to be of unit length, in this case we normalize n = n/\n\ and use the fact that 

F^n = \n\F~^h. (13) 

Finally, the multipliers J^ k ar e defined at the node points (not the Gauss integration points), and can 
also be pre-computed. 

2.3. Stencil size and sparsity pattern 

To illustrate the drastic reduction of the number of entries in the Jacobian matrices for the Line-DG 
method, consider the (p + l) 3 nodes in an (interior) element and its six neighboring elements. For a first- 
order operator, we note that a standard nodal DG formulation will in general produce full block matrices, 
that is, each degree of freedom will depend on all the other ones within the element. In addition, the 
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DGSEM/SD connectivities 
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25 
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Nodal DG connectivities 


20 


45 


88 


155 


252 


385 


560 


783 


1060 


1397 



Table 1: The number of connectivities per node for a first-order operator and 2-D quadrilateral / 3-D hexahedral elements 
with the Line-DG, the DGSEM/SD, and the nodal DG methods. 



face integrals will connect all nodes on an element face to all neighboring element face nodes. This gives 
6(p + l) 2 (p + l) 2 = 6(p + l) 4 additional connections per element, or in average 6(j> + l) 4 /(p + l) 3 = 6(p + 1) 
connections per degree of freedom. In total, the average number of connections is (p+l) 3 + 6(p+l), which 
illustrates why matrix-based DG methods are considered memory intensive and expensive even at modest 
values of p. 

As a contrast, in our line-based method each node will only connect to other nodes within the same lines, 
and to only one node in each neighboring element, for a total of (3p + 1) + 6 = 3p + 7 connectivities. This 
is similar to that of the DGSEM/SD methods, although with Gauss-Legendre solution points these schemes 
also connect entire lines of nodes in the neighboring element, giving a total of (3p + 1) + 6(p +1) = 9p + 7 
connectivities. These numbers are tabulated for a range of degrees p in three dimensions in table 1. The 
sparsity patterns are illustrated in figure 2 for two-dimensional quadrilateral elements, for all three methods. 
The connectivities are shown both by a nodal plot, with bold nodes corresponding to the dependencies of 
the single red node, and by sparsity plots of the Jacobian matrices. 

We note that in three dimensions, already for p = 3 the Line-DG method is 5.5 times sparser than nodal 
DG, and for p = 10 it is almost 40 times sparser. This reduction in stencil size translates into lower assembly 
times, but more importantly, for matrix-based solvers it means drastically lower storage requirements and 
faster matrix-vector products for iterative implicit solvers. 



3. Second-order equations 

We now consider the discretization of equations with second-order derivatives, in the form of a system 
of conservation laws 

Olii 

— + V • F(u,Vu) = S(u,Vu). (14) 
at 

We first use a standard technique in many finite difference and discontinuous Galerkin methods, and intro- 
duce the auxiliary variables q and rewrite as a split system 

^+V-F(u,q) = S(u,q), (15) 
Vu = q. (16) 

This essentially has the form of our first-order system (1), and we can apply Line-DG to each solution 
component as described above. More specifically, the change of variables from x to X transforms (15), (16) 
into 

OUL — 

J 7 - + Vx-F(u,q) = JS{u,q), (17) 
ot 

V x ■ u(u) = Jq, (18) 
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Figure 2: The connectivities (blue circles) to a single node (red circle) for the Line-DG method, the DGSEM/SD method, and 
the nodal DG method (2-D quadrilateral elements, a first-order operator). 
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where u = (ui,U2,u%) = u<Ei JG~ X . We discretize (17) as described before for the first-order case, treating 
q as additional solution components. For (18), we use a completely analogous procedure. We introduce the 
grid function qij k = q(Xijk), and along each curve Xjfc(£) we define the polynomial qjk(0 € V p ([0, l]) mx3 
that interpolates qiyfe, i = 0,...,p. We find the numerical approximation djk(Xx) to dui/dXi by the 
Galerkin formulation: Find d ]k {Cj e P p ([0, l]) mx3 such that 

= Si(tt+(1), q+(l), u jk (l), q 3k (l)) : r(l) - fii(« jfc (0), g, fe (0), uT fc (0),g- (Q)) : T (0) 

-jT 5i(« ifc (0):^de (19) 

for all test functions t(£) € ^([0, l]) mx3 . Note that we allow for the numerical flux u\ to depend on both u 
and q on each side of the face, even though the actual flux U\ is only a function of u. Again, the numerical 
contravariant fluxes can be written in terms of the actual fluxes and the actual normal vector: 

ui=u - 7V+ = u ® nf = u ® nf, (20) 

and similarly in the negative direction and along the other coordinate directions. It remains only to define 
the numerical fluxes F ■ n = F ■ n and u. We could in principle consider any scheme that can be written in 
this form [22], such as the interior penalty method, the BR2 method, the LDG method [20], and the CDG 
method [23]. Here we use a scheme based on the LDG method, because it has a simple upwind/downwind 
character, it does not evaluate derivatives of grid functions at the boundaries, and it appears well-suited for 
our Line-DG discretization. Furthermore, since our implicit solvers avoid the elimination of q, the scheme 
has a compact connectivity (only connects neighboring elements). 
First, we separate the fluxes F into an inviscid and a viscous part: 

F(u, Vu) = F inv (u) + F vis {u, Vu). (21) 

This decomposition is clearly not unique, but it is understood that for many problems there is a natural 
separation into a convection-dominated inviscid component and a diffusion-dominated viscous component. 
This allows us to use standard approximate Riemann solvers for F mv as before, and we will now consider 
only the treatment of the viscous fluxes F Vls . For shorter notation, we assume below that F = F vls and 
that n is a unit vector. 

We will define the fluxes in terms of a so-called switch function, which simply assigns a sign to each 
internal element face. For one-dimensional problems, the natural switch function is to set all these signs 
equal (either positive or negative), and we will mimic this for our Line-DG method by identifying globally 
connected lines in our hexahedral meshes. 

In our notation, instead of assigning switches to each face, we introduce S t €{ — 1,1} for the switches at 
local coordinate £ = 1 and £ = along direction i = 1, 2, 3. There is some redundancy here, since we require 
that = —S^, and also that the switch function for a shared face between two neighboring elements have 
opposite signs. See figure 3 for an example quadrilateral mesh and switch function. This was generated 
by a straight-forward algorithm, where an arbitrary element face is chosen and assigned an arbitrary sign, 
which then defines the alternating pattern along a sequence of elements in both directions. This procedure 
is repeated until all faces have been processed. 

With the switch function defined, we can formulate the LDG numerical fluxes for the second-order terms: 

F(u, q, n) = {{F{u, q)}} + C n [u ® nj + C 12 ® [F(u, q) ■ n], (22) 
u(u, q, n) = {{«}} - Cu - [u ® n] + C 22 [F(m, q) ■ nj. (23) 

for a solution u, q and a face normal vector n. Here, {{•}} denotes the mean value and [•] denotes the jump 
over a face: 

{{v}} = i(t>+ + v~), lvQn] = v + Qn-v-Qn (24) 
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Figure 3: A sample quadrilateral mesh and switch function S i for i = 1,2 in each element. Note that the switches have 
consistent directions along each one-dimensional global curve (dashed blue lines). 



where v + is the quantity v on the positive side of the face (according to the normal n), v~ is v on the 
negative side, and is any multiplication operator. The coefficients Cn, C12, C22 give the scheme different 
properties, and we note in particular that: 

• If C22 = 0, the fluxes (23) do not depend on q, which means the discretized equation (18) immediately 
give q within each element (no coupling to equation (17)). 

• For the particular choice C12 = nS' l ± /2, where Sf is the switch for the considered face and direction, 
the fluxes can be written in the form: 



F(u R ,q R ) if S± =+1 
F(u L ,q L ) if Sf = -1 



F(u R ,q Il ,u L ,q L ,n) = C 11 lu®n} + { J /™ ;r J ± \ (25) 



u(u R) q Rl u L ,q L ,n) = C 2 2{F(u,q) ■ n\ + \ UL ^ S * + ] (26) 

[u R if SI = -1 

where it is clear how the method is upwinding/downwinding the two numerical fluxes, depending on 
the switch S t . The constants C\\ and C22 are additional stabilization parameters, which can be 
seen as penalties on the jumps in the solution and in the normal fluxes, respectively. In many of our 
problems we set both of these coefficients to zero (the so-called minimal dissipation LDG method [24]). 
This makes the scheme particularly simple, and also further reduces the number of connectivities in 
the Jacobian matrices. 

At the boundaries we impose conditions by appropriate choices of numerical fluxes. For example, at a 
Dirichlet-type boundary with a prescribed solution u — gp, we set: 

F(u,q,n) = F(u,q)+Cu(u-g D )®n (27) 
u(u,q,n)=g D , (28) 

where C\\ in (27) must be positive, even though we often choose C\\ = for the interior fluxes. At a 
Neumann- type boundary with prescribed normal fluxes F ■ n = g^, we set: 

F(u,q,n) = g N <E>n (29) 
u(u, q,n)=u C 22 {F(u, q) n- g N ). (30) 
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For mixed conditions we apply combinations of these fluxes for the different components of u and q. 

With the fluxes defined, we can calculate d^k = d\j k for all j,k = 0,. . . ,p, and similarly for dy]. and d\^ k 
along the other two coordinate directions. These are essentially numerical approximations to the gradient 
q = Vw, but again we point out that they might depend implicitly on q through the numerical fluxes (if 
C22 7^ 0). Our final semi-discrete formulation for (17), (18) gets the form 

S(u ijk ,q ijk ) (31) 
qijk (32) 



du 



Ijk 



dt 



1 

J ijk 



,(n) 



3 

n=l 

1 3 ( ) 
J— ^2 d ijk 



4. Temporal discretization and nonlinear solvers 

4-1- Method of lines and time integration 

We use various techniques to solve the semi-discrete system of equations (31), (32), either by integrating 
in time or solving for steady-state solutions. First, we define the vectors U, Q with all solution components 
UijkiQijk^ respectively, and write the system as 



dU 

~dt 

Q 



R(U,Q), 
D(U,Q). 



(33) 
(34) 



This split form can be useful for implicit time-stepping or steady-state solutions, in particular if the coefficient 
C22 7^ 0. With a standard Newton's method, this requires the solution of linear systems involving the matrix 



K 



OR 
dU 
OP 
dU dQ 



c)R 
i)Q 



K n K 12 
K21 K22 



(35) 



This system solves for both U and Q but it retains the high level of sparsity of the method. Also, it allows 
for non-zero -K22 which can be used to give the scheme several attractive properties [15]. In our examples, 
we solve these equations using a standard sparse direct solver [25]. 

However, in most of our problems we set C22 = to allow for elimination of the discrete derivatives Q. 
Then D(U,Q) = D(U), and substituting (34) into (33) leads to a reduced system 



dU 

~dt 



R(U,D(U)) = F(U). 



(36) 



This is clearly the preferred choice for explicit time-stepping, since it is a regular system of ODEs. In 
our examples we use a standard fourth-order explicit Runge-Kutta method. We also use this form for 
implicit time-stepping using Diagonally Implicit Runge-Kutta (DIRK) schemes [26]. In particular, we use 
the following L-stable, three-stage, third-order accurate method [26]: 



Ki = F(u n + AtJ2a ij K j ), i = l 



U n+1 = U n + AtJ2 b J K . 



(37) 
(38) 



with s = 3 and the coefficients given by the Runge-Kutta tableaux below. 
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a = 0.435866521508459 
r 2 = (l + a)/2 
b l = -(6a 2 - 16a+l)/4 
b 2 = (6a 2 -20a + 5)/4 

We also use implicit time-stepping for computing steady-state solutions, by a sequence of increasing timesteps 
At and a final step without the time derivatives. Since this does not require time-accuracy, we use a standard 
backward Euler scheme. We solve the nonlinear systems (37) using Newton's method with preconditioned 
iterative solvers, as described below. 



a 


a 










t 2 - a 


a 





1 


h 


h 


a 




bi 


b 2 


a 



4-2. Newton- Krylov solvers 

When Newton's method is applied to the reduced problem (36), it requires the solution of systems of 
equations of the form 

(I - aAtA)AU [l) = AtR{U {i \D(U^)) (39) 

where At is the timestep and 

. dR dR dRdD „ „ „ 
A =dU = dU + dQ8U =K ^ +K ^- (40) 

Forming this matrix A has the drawback that for second-order systems, the product K\ 2 K 2 \ is in 
general much less sparse than the individual matrices Ku, K\ 2l K 2 \. This is expected due to the repeated 
differentiation along two different directions, but it requires special solvers to avoid explicitly forming the 
denser matrix A. This phenomenon is not unique for our method, in fact many other numerical schemes 
including finite difference methods and nodal DG methods suffer from sparsity reduction for second-order 
systems. 

In this work, we use a simple approach to solve the system (39) without forming the full Jacobian matrix. 
In a preconditioned Krylov subspace method, we need to perform two operations: Multiplication of a vector 
p by the matrix (I — a At A), and approximate solution of (I — aAtA)x — b for the preconditioning. The 
matrix- vector product can by computed by keeping the individual matrix in a separated form and nesting 
the products: 

(I - aAtA)p = p-aAt {K n p + K 12 (K 21 p)) . (41) 

This avoids explicitly forming the matrix A, and the cost per matrix-vector product is proportional to the 
number of entries in the matrices Ku> K\ 2 , K 2 \. 

For preconditioning, we use a sparse block-Jacobi approach which forms an approximate matrix A that 
ignores all fill from the product K\ 2 K 2 \ and any inter-element connectivities. In other words, A is equal to 
A only at the block-diagonal Line-DG sparsity pattern, and zero everywhere else. This simple preconditioner 
requires very little storage (even less than a first-order discretization or the matrix Ku) and it performs 
well for time-accurate simulations with small timesteps. It can certainly be improved upon, for example 
allowing higher levels of fill or using block-ILU and /i/p- multigrid schemes [27]. 

When solving the linear systems involving the preconditioning matrix (J — a At A), we use a sparse direct 
LU- factorization with fill-reducing ordering for each block [25]. This results in some additional fill, but in 
our 2-D examples we find that even for p as high as 7 the number of entries in A is about the same as in 
the original sparse matrices Ku, K\ 2 , and K 2 i. For 3-D problems, it is more critical to retain the line- 
based sparsity in the preconditioner, and a number of alternatives should be applicable such as low-order 
approximations [28], AD I- iterations [29], and subiterations [30]. 

In our implementation, we store all matrices in a general purpose compressed column storage format 
[25]. We also point out that the matrix K 2 \ can be handled very efficiently, since it is a discrete gradient 
operator and therefore (a) linear, (b) constant in time, and (c) equal for all solution components (except 
possibly at the boundaries for certain boundary conditions). 
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Coarsest mesh, p = 5 One refinement, p = 5 Solution u(x, y) 



Figure 4: The Poisson test problem (42). The left figure shows the coarse unstructured quadrilateral mesh, which is uniformly 
refined repeatedly, and the solution nodes for p = 5. The right figure shows the solution as colored contours. 



4-3. Re-using Jacobian matrices 

In many problems it is more computationally expensive to form the matrix J — aAtA than to solve 
the linear system (39). This is especially true for time-accurate integration where the timesteps At are 
relatively small (but still large enough to motivate the use of implicit solvers). We therefore use a standard 
technique for Newton's method that attempts to re-use old Jacobian matrices for many iterations, until the 
convergence is too slow as determined by the number of iterations exceeding a threshold number. In our 
numerical experiments this is sufficient to allow for a reuse of the Jacobian for a large number of Newton 
steps, DIRK stages, and timesteps at a time. 

The linear systems are solved using a preconditioned GMRES method [31], with the low-cost sparse 
block- Jacobi preconditioner A described above. These equations can be solved with relatively low accuracy 
using a small number of GMRES iterations, since we are using old Jacobian matrices which already limit 
the potential improvements from each Newton step. The tolerance in the Newton solver is set well below 
the estimated truncation error of the time integrator. 



5. Results 

5.1. Poisson's equation 

Our first test is Poisson's equation 

-V ■ (Vu) = f(x,y) (42) 

on the unit square domain f2 = [0, l] 2 . Dirichlet conditions are imposed at all the boundaries (dilo = dd) 
and we choose the analytical solution 

u(x, y) = exp [a sin(aa; 4- by) + /3 cos(cx + dy)] (43) 

with numerical parameters a — 0.1, j3 — 0.3, a = 5.1, b = — 6.2, c = 4.3, d — 3.4. We then solve (42) with 
Dirichlet boundary conditions grj(x,y) — u(x, y)\on D - The source term, f(x,y), is obtained by analytical 
differentiation of (43). 

We discretize SI using an unstructured mesh of quadrilateral elements, see Figure 4 (left). We solve the 
split system (33), (34) using a direct sparse solver, for polynomial degrees p = 1, ... ,7. We consider two 
sets of parameters: the minimal dissipation scheme with Cn = C22 = 0, which has the benefit that it allows 
for elimination of the gradients q, and the slightly over-stabilized scheme C\\ — C22 — 1/20, which may 
provide a higher-order of convergence for q [15]. 

The resulting infinity norm errors and rates of convergence are shown in table 2, for both the solution u 
and the gradients q and the two parameter cases. For the minimal dissipation scheme (top two tables), we 
observe the expected orders of convergence p + 1 and p for u and q, respectively. For the stabilized scheme 
(bottom two tables), we obtain a somewhat higher order for the q variables, which could be used as part of 
a postprocessing step to further increase the order of convergence for the solution u [16]. 
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Table 2: Convergence of u and q for the Poisson problem, with C12 = /2 and Cn = C22 = (top two tables), 

Cn = C22 = 1/20 (bottom two tables). For the first case we observe approximate rates of p + 1 for u and p for q, while the 
nonzero Cn, C22 case appears to give a significantly higher rate for q. A star symbol (*) indicates that the error is dominated 
by floating point rounding errors rather than the truncation error of the scheme. 



5.2. Euler vortex 

Next we consider the compressible Euler and Navier-Stokes equations, which we write in the form: 

9 (P u *) + ir(P u * u i+P) = +lS 1 fori = 1,2,3, (45) 



l t i P E) + l i {u Ap E + p)) = ^ + l- j{ u j r ij ), (46) 



where p is the fluid density, ui,U2,u 3 are the velocity components, and E is the total energy. The viscous 
stress tensor and heat flux are given by 

dui i duj 2 duk s \ j p, d f ^ p I 



- ''U, • ;>,, :u,,/-j and ^ = -prd x - 1 { E+ 'p'2 UkUk )- (47) 

Here, \i is the viscosity coefficient and Pr = 0.72 is the Prandtl number which we assume to be constant. 
For an ideal gas, the pressure p has the form 

p = (7 - l)p (e - ^Ufctifc^ , (48) 
13 



where 7 is the adiabatic gas constant. 

Our first model problem is the inviscid flow of a compressible vortex in a rectangular domain [32] . The 
vortex is initially centered at (xo,yo) and is moving with the free-stream at an angle 9 with respect to the 
cc-axis. The analytic solution at (x, y, t) is given by 

« = «- fcosg- e ^7 o) -^ e X p(//2)), 
V 27rr c J 

w = ^ fsin0 + ^7 i ex P (//2) , 

where f(x, y, t) = (1 — ((a; — xo) — wt) 2 — {{y — yo) — vt) 2 )/rl, M x is the Mach number, 7 = Cy/c^ = 1.4, and 
Woo j Poo j Poo are free-stream velocity, pressure, and density. The Cartesian components of the free-stream 
velocity are u — Uoo cos# and v — Uoo sin (9. The parameter e measures the strength of the vortex and r c is 
its size. 

We use a domain of size 20-by-15, with the vortex initially centered at (xoj2/o) = (5,5) with respect 
to the lower-left corner. The Mach number is Moo = 0.5, the angle 9 = arctanl/2, and the vortex has 
the parameters e = 0.3 and r c = 1.5. We use characteristic boundary conditions and integrate until time 
to = VlO 2 + 5 2 /10, when the vortex has moved a relative distance of (1, 1/2). 

We write the Euler equations as a first-order system of conservation laws (1), in the conserved variables 
(p, pu, pv, pE). The scheme (10) is implemented in a straight-forward way, and we use Roe's method for the 
numerical fluxes (5) [33]. The time-integration is done explicitly with the form (36) using the RK4 solver 
and a timestep At small enough so that all truncation errors are dominated by the spatial discretization. 
We start from a coarse unstructured quadrilateral mesh (figure 5, top left), which we refine uniformly a 
number of times, and we use polynomial degrees p ranging between 1 and 8. The top right plot also shows 
the density field for a sample solution. 

In the bottom plot of figure 5, we graph the maximum errors (discretely at the solution nodes) for all 
simulation cases, both for the Line-DG method and the standard nodal DG method. The results clearly 
show the optimal order of convergence 0(h p+1 ) for element size h for both methods, and that the Line-DG 
errors are in all cases very close to those of the nodal DG method. 

5.3. Inviscid flow over a cylinder 

Next we study a problem with a steady-state solution and curved boundaries, and solve the Euler 
equations for the inviscid flow over a half-cylinder with radius 1 at a Mach number of 0.3. Structured 
quadrilateral meshes are used, with strong element size grading to better resolve the region close to the 
cylinder (see figure 6, top left). The outer domain boundary is a half-cylinder with radius 10, where 
characteristic boundary conditions are imposed. Standard slip wall/symmetry conditions are used at the 
cylinder and at the symmetry plane. 

The steady-state solutions are found using a fully consistent Newton method, applied directly to the 
equations (10), with the linear systems solved using a direct sparse solver [25]. Starting the iterations from 
an approximate analytical solution, derived from a potential flow approximation, the solver converges to 
machine precision in 4 to 6 iterations. The solution is shown in the bottom left of figure 6, and the figures 
to the right show portions of the Jacobian matrices for both the Line-DG and the nodal DG method. This 
illustrates again the reduced sparsity of the Line-DG scheme, with about a factor of 4 fewer entries than 
nodal DG already in two space dimensions. 

To evaluate the accuracy and convergence of the scheme, in figure 7 we plot the errors in the lift coefficient 
Cl (left) and the maximum errors in the entropy (right). These plots again confirm the convergence of the 
schemes as well as the minor differences in error between the Line-DG and the nodal DG schemes. 

5.4- Laminar flow around airfoil 

An example of a steady-state viscous computation is shown in figure 8. The compressible Navier-Stokes 
equations are solved at Mach 0.2 and Reynolds number 5000, for a flow around an SD7003 airfoil. The 



P = Poo 1 



P = Poo 1 



e 2 (7~l)A^ 
8tt 2 

e 2 ( 7 -l)Ml 
8tt 2 



exp(/)) 



exp(/) 



(49) 
(50) 
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Coarsest mesh, with degree p = 7 



Solution (density) 




Typical element size h 

Figure 5: Convergence test for an Euler vortex test problem using the Line-DG method and the nodal DG method. The results 
show optimal order of convergence 0(h p+1 ) with very small differences between the two methods. 

quadrilateral mesh is fully unstructured except for a structured graded boundary layer region, with a total 
of 461 elements for the coarse mesh, and 1844 and 7376 elements for the once and twice refined meshes, 
respectively. With approximating polynomials of degree p = 6, this gives a total number of high-order nodes 
of 22,589 for the coarse mesh and 90,356 for the first refinement. 

We find the steady-state solution with 10 digits of accuracy in the residual using a consistent Newton's 
method, with pseudo-timestepping for regularization. A solution is shown in figure 8 (top right), for the 
coarse mesh with p = 7. In the bottom plots, we show the convergence of the drag and the lift coefficients, 
for a range of polynomial degrees p. While it is hard to asses the exact order of convergence from these 
numbers, it is clear that our scheme provides a high order of convergence even for these difficult derivative- 
based quantities. 

5. 5. Transient flow around airfoil at Re = 20, 000 

In our last example, we demonstrate time-accurate implicit solution of transient flow around an SD7003 
airfoil at Re = 20,000. The mesh is highly resolved in the boundary layer, however, for this Reynolds 
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Solution, Mach number nz = 1 1 7370 nz = 480346 

Figure 6: Inviscid flow over a cylinder. The plots show the coarsest grid used in the convergence study and the nodes for 
polynomial degree p = 7 (top left), the corresponding solution as Mach number color plot (bottom left, with zoom-in), and a 
sparsity plot of the Jacobian matrices for both Line-DG and nodal DG (right). 



number it is coarser than the flow features in much of the domain and the computations should therefore 
be considered an under-resolved ILES-type model [34]. 

The Mach number is 0.1 and the angle of attack is 30 degrees to force flow separation at the leading edge. 
We use the three-stage DIRK scheme (37), (38), solved with Newton's method as described in sections 4.2 
and 4.3. At each Newton step we perform 5 GMRES iterations, and if the number of Newton iterations 
exceeds 15 we recompute the Jacobian matrices. Our computational mesh has 1122 quadrilateral elements 
with polynomial degrees p — 7. The mesh and a solution at the normalized time of t — 1.76 are shown in 
the top plots of figure 9. 

We use the timestep At = 2 ■ 10 -4 , which is about 250 times larger than the largest stable explicit 
RK4 timestep, yet small enough to accurately capture most of the complex flow features. It is difficult to 
estimate the accuracy in the simulation, due to the under-resolved nature of LES and the high sensitivity of 
transitional flows. However, we have run the same problem using a nodal DG code which has been tested 
against other simulations as well as experiments [34] . The bottom left plot of figure 9 shows that the lift and 
drag forces on the airfoil agree well between the two schemes, until small perturbations have grown enough 
to cause large differences between the flows. 

We also plot the performance of the Newton solver, in the bottom right of figure 9. It shows how the 
number of Newton iterations per solve remains fairly constant, and about once every 100th timestep it 
reaches 16 which forces a recomputation of the Jacobian matrix. The average number of iterations per 
Newton solve for the entire simulation is 14. Because of the sparsity of the matrices and the splitting 
(41), these iterations are relatively inexpensive compared to residual evaluation. Our implementation is not 
optimized for performance, but the relative times for the four operations (1) GMRES iteration, (2) residual 
evaluation, (3) Jacobian evaluation, and (4) preconditioner factorization are roughly 1:4:40:16. Therefore, 
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Typical element size h Typica l e | ement size h 

Figure 7: The convergence of the lift coefficient Cl (left) and the entropy difference (right) for the inviscid flow over cylinder 
problem. The plots show a series of results for varying polynomial degrees and number of refinements, for the two methods 
Line-DG and nodal DG. 




Figure 8: Stationary flow around an SD7003 airfoil (top left: mesh, top right: Mach number), computed with the Line-DG 
method with p = 7, at free-stream Mach 0.2, zero angle of attack, and Reynolds number 5,000. The bottom plots show the 
convergence of Co and Cl, for a range of polynomial degrees and with 0, 1, or 2 uniform mesh refinements. 



since we only perform 5 GMRES iterations per solve, we spend about the same time in residual evaluation as 
in the linear solver. In this sense, the solver is similar to an explicit scheme in that it spends a large portion 
of its computational time in residual evaluations, which gives benefits e.g. in the parallelization on new 
parallel multicore computer architectures with limited memory bandwidth. The time for re-assembly and 
factorization of the preconditioner is negligible, since they are only performed once in about every 100th 
timestep, which corresponds to 1400 residual evaluations or 7000 GMRES iterations. We also point out 
that without preconditioning about 20 times more GMRES iterations are required for the same tolerance, 
showing that even our simple preconditioner makes a drastic difference on the convergence. 
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Time Newton solve # 



Figure 9: Implicit transient simulation of flow around an SD7003 airfoil, at 30 degrees angle of attack, Reynolds number 20,000, 
and Mach number 0.1. Line-DG with polynomial degrees of p = 7 is used for the spatial discretization, and a three stage, third 
order accurate DIRK scheme is used for time integration. The nonlinear systems are solved using a Newton-Krylov solver with 
re-used Jacobian matrices. The CFL number compared to the explicit RK4 method is about 250. The top figures show the 
computational mesh and color contours of the Mach number, from to 0.3. The bottom left plot compares the lift and drag 
coefficient with a nodal DG scheme, indicating good agreement until the small differences between the schemes have grown 
enough to make the solutions hard to compare. The bottom right plot shows the performance of the nonlinear Newton solver, 
and in particular how it only recomputes the Jacobian matrices once in about every 100th solve. 

6. Conclusions 

We have presented a new line-based DG method for first and second-order systems of equations. The 
scheme has a simple structure, with only one-dimensional integrals and standard Riemann solvers applied 
point-wise. Compared to the standard nodal DG method, this gives a simpler assembly process and a 
fundamentally different sparsity structure, which we used to develop efficient matrix-based implicit solvers. 
Compared to collocation based methods such as the DG spectral element method, it uses fully consistent 
integration along each coordinate-direction, and it slightly reduces the connectivities to neighboring elements 
by the choice of solution nodes. We showed that the accuracy of the discretizations are very similar to the 
standard DG method, and we demonstrated a stiff LES-type flow simulation with high-order DIRK time- 
integration with Newton-Krylov solvers and re-used Jacobians. 

A number of further developments are needed to make the scheme competitive for real-world problems. 
We have not addressed the issue of nonlinear stability for under-resolved features, including shock capturing, 
where approaches such as artificial viscosity and limiting could be adopted. For the solvers, our simple block- 
Jacobi preconditioner can be much improved upon, using e.g. multigrid and ILU techniques. Finally, for 
large problems the implementation needs to be parallelized, in particular for the new generation of multicore 
and GPU chips where memory bandwidth is limited. Here the high sparsity of the Line-DG scheme might 
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have additional benefits over the standard nodal DG scheme. 
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